source model
Backpropagating Linearly Improves Transferability of Adversarial Examples (Supplementary Material)
Empirical results in Section 3.1 in the main paper show that simply removing ReLUs lead to improved transferability. In this section, we try freezing all learnable parameters in the unmodified sub-net h during fine-tuning and a similar observation about the initial improvement of transferability can still be decrease made and (see finally Figure the 5). Classification loss of these modified VGG-19 models on the benign CIFAR-10 test set is also reported, in Figure 6. On ImageNet, it is evaluated on the 50000official validation images. As mentioned in the main paper, many recent successes in improving adversarial transferability benefit from maximizing intermediate level distortions rather than the final prediction losses [8, 3, 2] of DNNs.
Backpropagating Linearly Improves Transferability of Adversarial Examples
The vulnerability of deep neural networks (DNNs) to adversarial examples has drawn great attention from the community. In this paper, we study the transferability of such examples, which lays the foundation of many black-box attacks on DNNs. We revisit a not so new but definitely noteworthy hypothesis of Goodfellow et al.'s and disclose that the transferability can be enhanced by improving the linearity of DNNs in an appropriate manner. We introduce linear backpropagation (LinBP), a method that performs backpropagation in a more linear fashion using off-the-shelf attacks that exploit gradients. More specifically, it calculates forward as normal but backpropagates loss as if some nonlinear activations are not encountered in the forward pass. Experimental results demonstrate that this simple yet effective method obviously outperforms current state-of-the-arts in crafting transferable adversarial examples on CIFAR-10 and ImageNet, leading to more effective attacks on a variety of DNNs.
Refining Language Models with Compositional Explanations
Pre-trained language models have been successful on text classification tasks, but are prone to learning spurious correlations from biased datasets, and are thus vulnerable when making inferences in a new domain. Prior work reveals such spurious patterns via post-hoc explanation algorithms which compute the importance of input features. Further, the model is regularized to align the importance scores with human knowledge, so that the unintended model behaviors are eliminated. However, such a regularization technique lacks flexibility and coverage, since only importance scores towards a pre-defined list of features are adjusted, while more complex human knowledge such as feature interaction and pattern generalization can hardly be incorporated. In this work, we propose to refine a learned language model for a target domain by collecting human-provided compositional explanations regarding observed biases. By parsing these explanations into executable logic rules, the human-specified refinement advice from a small set of explanations can be generalized to more training examples. We additionally introduce a regularization term allowing adjustments for both importance and interaction of features to better rectify model behavior. We demonstrate the effectiveness of the proposed approach on two text classification tasks by showing improved performance in target domain as well as improved model fairness after refinement1.
A.1 Conjugate Derivations Cross-Entropy Loss: L(h,y) = cX
Pc i=1 yi = 1is satisfied, otherwise f (y) = by duality. A.2 Experiments on Binary Classification with Exponential Loss Here we present the results on a binary classification task over a synthetic dataset of 100 dimensional gaussian clusters. For Σ, similar to [23], we sample a diagonal matrix D, where each entry is sampled uniformly from a specified range, and a rotation matrix U from a HAAR distribution, giving Σ = UDUT. For the source data, we sample µ 1s,µ+1s,Σ 1s,Σ+1sas specified above with k = 0. Now to create a distribution shifted data of various severity, we sample µ 1t,µ+1t,Σ 1t,Σ+1tas specified above with k = 1, which are then used to sample the shifted data as follows: Exponential Loss for Binary Classification Let z be the classification score hθ(x). For logistic training loss, conjugate adaptation loss would default to entropy with sigmoid probability.
Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning
We investigate a practical domain adaptation task, called source-free unsupervised domain adaptation (SFUDA), where the source pretrained model is adapted to the target domain without access to the source data. Existing techniques mainly leverage self-supervised pseudo-labeling to achieve class-wise global alignment [1] or rely on local structure extraction that encourages the feature consistency among neighborhoods [2]. While impressive progress has been made, both lines of methods have their own drawbacks - the "global" approach is sensitive to noisy labels while the "local" counterpart suffers from the source bias. In this paper, we present Divide and Contrast (DaC), a new paradigm for SFUDA that strives to connect the good ends of both worlds while bypassing their limitations. Based on the prediction confidence of the source model, DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals under an adaptive contrastive learning framework. Specifically, the source-like samples are utilized for learning global class clustering thanks to their relatively clean labels. The more noisy target-specific data are harnessed at the instance level for learning the intrinsic local structures. We further align the sourcelike domain with the target-specific samples using a memory-based maximum mean discrepancy (MMD) loss to reduce the distribution mismatch. Extensive experiments on VisDA, Office-Home, and the more challenging DomainNet have verified the superior performance of DaC over current state-of-the-art approaches.
Perturbation Towards Easy Samples Improves Targeted Adversarial Transferability
The transferability of adversarial perturbations provides an effective shortcut for black-box attacks. Targeted perturbations have greater practicality but are more difficult to transfer between models. In this paper, we experimentally and theoretically demonstrated that neural networks trained on the same dataset have more consistent performance in High-Sample-Density-Regions (HSDR) of each class instead of low sample density regions. Therefore, in the target setting, adding perturbations towards HSDR of the target class is more effective in improving transferability. However, density estimation is challenging in high-dimensional scenarios.
Transfer Learning in Bayesian Optimization for Aircraft Design
Tfaily, Ali, Diouane, Youssef, Bartoli, Nathalie, Kokkolaras, Michael
The use of transfer learning within Bayesian optimization addresses the disadvantages of the so-called \textit{cold start} problem by using source data to aid in the optimization of a target problem. We present a method that leverages an ensemble of surrogate models using transfer learning and integrates it in a constrained Bayesian optimization framework. We identify challenges particular to aircraft design optimization related to heterogeneous design variables and constraints. We propose the use of a partial-least-squares dimension reduction algorithm to address design space heterogeneity, and a \textit{meta} data surrogate selection method to address constraint heterogeneity. Numerical benchmark problems and an aircraft conceptual design optimization problem are used to demonstrate the proposed methods. Results show significant improvement in convergence in early optimization iterations compared to standard Bayesian optimization, with improved prediction accuracy for both objective and constraint surrogate models.
DALD: Improving Logits-based Detector without Logits from Black-box LLMs
The advent of Large Language Models (LLMs) has revolutionized text generation, producing outputs that closely mimic human writing. This blurring of lines between machine-and human-written text presents new challenges in distinguishing one from the other - a task further complicated by the frequent updates and closed nature of leading proprietary LLMs. Traditional logits-based detection methods leverage surrogate models for identifying LLM-generated content when the exact logits are unavailable from black-box LLMs. However, these methods grapple with the misalignment between the distributions of the surrogate and the often undisclosed target models, leading to performance degradation, particularly with the introduction of new, closed-source models. Furthermore, while current methodologies are generally effective when the source model is identified, they falter in scenarios where the model version remains unknown, or the test set comprises outputs from various source models.
A.1 ConjugateDerivations Cross-EntropyLoss: L(h,y) = cX
Thelossesarecompared onthreedegreesofshift(easy,moderate and hard), which is controlled by the drifted distance of Gaussian clusters. Herewediscuss the architecture chosen and the implementation details. Note that the task loss / surrogate loss function is used to update the meta-loss mϕ during meta-learning. The number of transformer layers and the hidden layers in MLP are selected from{1,2}. Wecanseethatthetask loss barely affects the learnt meta loss.